尽管近期因因果推断领域的进展,迄今为止没有关于从观察数据的收集治疗效应估算的方法。对临床实践的结果是,当缺乏随机试验的结果时,没有指导在真实情景中似乎有效的指导。本文提出了一种务实的方法,以获得从观察性研究的治疗效果的初步但稳健地估算,为前线临床医生提供对其治疗策略的信心程度。我们的研究设计适用于一个公开问题,估算Covid-19密集护理患者的拳击机动的治疗效果。
translated by 谷歌翻译
Inspired by foundational studies in classical and quantum physics, and by information retrieval studies in quantum information theory, we have recently proved that the notions of 'energy' and 'entropy' can be consistently introduced in human language and, more generally, in human culture. More explicitly, if energy is attributed to words according to their frequency of appearance in a text, then the ensuing energy levels are distributed non-classically, namely, they obey Bose-Einstein, rather than Maxwell-Boltzmann, statistics, as a consequence of the genuinely 'quantum indistinguishability' of the words that appear in the text. Secondly, the 'quantum entanglement' due to the way meaning is carried by a text reduces the (von Neumann) entropy of the words that appear in the text, a behaviour which cannot be explained within classical (thermodynamic or information) entropy. We claim here that this 'quantum-type behaviour is valid in general in human cognition', namely, any text is conceptually more concrete than the words composing it, which entails that the entropy of the overall text decreases. This result can be prolonged to human culture and its collaborative entities having lower entropy than their constituent elements. We use these findings to propose the development of a new 'non-classical thermodynamic theory for human cognition and human culture', which bridges concepts and quantum entities and agrees with some recent findings on the conceptual, not physical, nature of quantum entities.
translated by 谷歌翻译
In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.
translated by 谷歌翻译
Obstacles on the sidewalk often block the path, limiting passage and resulting in frustration and wasted time, especially for citizens and visitors who use assistive devices (wheelchairs, walkers, strollers, canes, etc). To enable equal participation and use of the city, all citizens should be able to perform and complete their daily activities in a similar amount of time and effort. Therefore, we aim to offer accessibility information regarding sidewalks, so that citizens can better plan their routes, and to help city officials identify the location of bottlenecks and act on them. In this paper we propose a novel pipeline to estimate obstacle-free sidewalk widths based on 3D point cloud data of the city of Amsterdam, as the first step to offer a more complete set of information regarding sidewalk accessibility.
translated by 谷歌翻译
Classifier-free guided diffusion models have recently been shown to be highly effective at high-resolution image generation, and they have been widely used in large-scale diffusion frameworks including DALLE-2, Stable Diffusion and Imagen. However, a downside of classifier-free guided diffusion models is that they are computationally expensive at inference time since they require evaluating two diffusion models, a class-conditional model and an unconditional model, tens to hundreds of times. To deal with this limitation, we propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from: Given a pre-trained classifier-free guided model, we first learn a single model to match the output of the combined conditional and unconditional models, and then we progressively distill that model to a diffusion model that requires much fewer sampling steps. For standard diffusion models trained on the pixel-space, our approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps on ImageNet 64x64 and CIFAR-10, achieving FID/IS scores comparable to that of the original model while being up to 256 times faster to sample from. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps, accelerating inference by at least 10-fold compared to existing methods on ImageNet 256x256 and LAION datasets. We further demonstrate the effectiveness of our approach on text-guided image editing and inpainting, where our distilled model is able to generate high-quality results using as few as 2-4 denoising steps.
translated by 谷歌翻译
我们希望研究叠加,情境性和纠缠的量子结构的起源在人类感知本身中的起源,因为它们是如何成功地用于模拟人类认知方面的。我们的分析将我们从一个简单的量子测量模型借鉴了人类的感知如何结合分类感知的扭曲机制,转变为概念原型理论的量子版本,当概念结合时,它允许动态上下文。我们的研究植根于一种操作量子公理学,该量子会导致概念的状态上下文属性系统。我们说明了我们的量子原型模型及其干扰,当将概念与两个详细范围详细解决的示例相结合时
translated by 谷歌翻译
许多现实世界中的问题都包含多个目标和代理,其中目标之间存在权衡。解决此类问题的关键是利用代理之间存在的稀疏依赖性结构。例如,在风电场控制中,在最大化功率和最大程度地减少对系统组件的压力之间存在权衡。涡轮机之间的依赖性是由于唤醒效应而产生的。我们将这种稀疏依赖性模拟为多目标配位图(MO-COG)。在多目标强化学习实用程序功能通常用于对用户偏好而不是目标建模,这可能是未知的。在这种情况下,必须计算一组最佳策略。哪些策略是最佳的,取决于哪些最佳标准适用。如果用户的效用函数是从策略的多个执行中得出的,则必须优化标识的预期收益(SER)。如果用户的效用是从策略的单个执行中得出的,则必须优化预期的标量回报(ESR)标准。例如,风电场受到必须始终遵守的限制和法规,因此必须优化ESR标准。对于Mo-COG,最新的算法只能计算一组SER标准的最佳策略,而ESR标准进行了研究。要计算在ESR标准下(也称为ESR集合)下的一组最佳策略,必须维护回报上的分布。因此,为了计算MO-COGS的ESR标准下的一组最佳策略,我们提出了一种新型的分布多目标变量消除(DMOVE)算法。我们在逼真的风电场模拟中评估了DMOVE。鉴于实际风电场设置中的回报是连续的,我们使用称为Real-NVP的模型来学习连续的返回分布来计算ESR集合。
translated by 谷歌翻译
由于鉴定了“身份”和“欺诈性”和强大的实验证据,在人类认知和语言中存在相关的Bose-Einstein统计数据,我们在以前的工作中争论了量子认知研究领域的延伸。除了量子复杂的矢量空间和量子概率模型之外,我们还表明量化本身,用词为量子,对人类认知是相关的和可能的重要性。在目前的工作中,我们在此结果构建,并引入了用于人类认知的强大辐射量化方案。我们表明,与Maxwell-Boltzmann统计数据相比,缺乏Bose-Einstein统计数据的独立性可以通过存在“含义动态”来解释,这导致与同一话语吸引的话语。因此,在同一个状态中,单词聚集在一起,在量子力学的早期众所周知的光子中熟知的现象,导致普朗克和爱因斯坦之间的激烈分歧。使用一个简单的例子,我们介绍了所有元素,以获得更好,更详细地了解这一“意义动态”,例如微型和宏状态,以及Maxwell-Boltzmann,Bose-Einstein和Fermi-Dirac编号和权重,并比较这一点示例及其图表,具有Winnie The PoOH故事的辐射量化方案,也具有图表。通过将概念直接连接到人类体验,我们表明纠缠是保留我们所识别的“意义动态”的必要性,并且在Fermi-Dirac解决人类记忆的方式变得清晰。在那里,在具有内部参数的空格中,可以分配不同的单词。
translated by 谷歌翻译
多智能体增强学习(MARL)使我们能够在挑战环境中创造自适应代理,即使观察结果有限。现代Marl方法迄今为止集中于发现分解价值函数。虽然这种方法已被证明是成功的,但是由此产生的方法具有复杂的网络结构。我们采取了彻底不同的方法,并建立在独立Q-Meashers的结构上。灵感来自基于影响的抽象,我们从观察开始的观察开始,即观察动作历史的紧凑型表示可以足以学习接近最佳分散的政策。将此观察与Dueling架构,我们的算法LAN相结合,表示这些策略作为单独的个性优势功能w.r.t.一个集中的评论家。这些本地优势网络仅在单个代理的本地观察操作历史记录上。代理商表示的集中值函数条件以及环境的完整状态。在执行之前将其施加的值函数用作稳定器,该稳定器协调学习并在学习期间制定DQN目标。与其他方法相比,这使LAN能够在代理的数量中独立于其集中式网络的网络参数的数量,而不会施加像单调值函数等额外约束。在评估星际争霸多功能挑战基准测试时,LAN显示最先进的性能,并在两个以前未解决的地图`和`3S5Z_VS_3S6Z'中获得超过80%的胜利,导致QPLEL的10%的提高在14层地图上的平均性能。此外,当代理的数量变大时,LAN使用比QPlex甚至Qmix的参数明显更少。因此,我们表明LAN的结构形成了一个关键改进,有助于Marl方法保持可扩展。
translated by 谷歌翻译
我们研究多个代理商在多目标环境的同时学习的问题。具体来说,我们考虑两种药剂重复播放一个多目标的正常形式的游戏。在这样的游戏,从联合行动所产生的收益都向量值。以基于效用的方法,我们假设效用函数存在映射向量标公用事业和考虑旨在最大限度地提高预期收益载体的效用代理。作为代理商不一定知道他们的对手的效用函数或策略,他们必须学会互动的最佳策略对方。为了帮助代理商在适当的解决办法到达,我们介绍四种新型偏好通信协议双方的合作以及自身利益的沟通。每一种方法描述了一个代理在他们的行动以及如何另一代理响应通信偏好的特定协议。这些协议是一组对不沟通基线代理5个标杆游戏随后对其进行评估。我们发现,偏好通信可以彻底改变学习的过程,并导致其没有在此设置先前观测环纳什均衡的出现。另外,还要在那里代理商必须学会当通信的通信方案。对于与纳什均衡游戏的代理,我们发现通信可以是有益的,但很难知道什么时候剂有不同的最佳平衡。如果不是这种情况,代理变得冷漠通信。在游戏没有纳什均衡,我们的结果表明,整个学习率的差异。当使用更快的学习者,我们观察到明确的沟通,在50%左右的时间变得越来越普遍,因为它可以帮助他们在学习的妥协联合政策。较慢的学生保留这种模式在较小的程度,但显示增加的冷漠。
translated by 谷歌翻译